68 research outputs found

    CARBON: A Counterfactual Reasoning based Framework for Neural Code Comprehension Debiasing

    Full text link
    Previous studies have demonstrated that code intelligence models are sensitive to program transformation among which identifier renaming is particularly easy to apply and effective. By simply renaming one identifier in source code, the models would output completely different results. The prior research generally mitigates the problem by generating more training samples. Such an approach is less than ideal since its effectiveness depends on the quantity and quality of the generated samples. Different from these studies, we are devoted to adjusting models for explicitly distinguishing the influence of identifier names on the results, called naming bias in this paper, and thereby making the models robust to identifier renaming. Specifically, we formulate the naming bias with a structural causal model (SCM), and propose a counterfactual reasoning based framework named CARBON for eliminating the naming bias in neural code comprehension. CARBON explicitly captures the naming bias through multi-task learning in the training stage, and reduces the bias by counterfactual inference in the inference stage. We evaluate CARBON on three neural code comprehension tasks, including function naming, defect detection and code classification. Experiment results show that CARBON achieves relatively better performance (e.g., +0.5% on the function naming task at F1 score) than the baseline models on the original benchmark datasets, and significantly improvement (e.g., +37.9% on the function naming task at F1 score) on the datasets with identifiers renamed. The proposed framework provides a causal view for improving the robustness of code intelligence models

    Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors

    Full text link
    Although the dynamic type system of Python facilitates the developers in writing Python programs, it also brings type errors at run-time. There exist rule-based approaches for automatically repairing Python type errors. The approaches can generate accurate patches but they require domain experts to design patch synthesis rules and suffer from low template coverage of real-world type errors. Learning-based approaches alleviate the manual efforts in designing patch synthesis rules. Among the learning-based approaches, the prompt-based approach which leverages the knowledge base of code pre-trained models via pre-defined prompts, obtains state-of-the-art performance in general program repair tasks. However, such prompts are manually defined and do not involve any specific clues for repairing Python type errors, resulting in limited effectiveness. How to automatically improve prompts with the domain knowledge for type error repair is challenging yet under-explored. In this paper, we present TypeFix, a novel prompt-based approach with fix templates incorporated for repairing Python type errors. TypeFix first mines generalized fix templates via a novel hierarchical clustering algorithm. The identified fix templates indicate the common edit patterns and contexts of existing type error fixes. TypeFix then generates code prompts for code pre-trained models by employing the generalized fix templates as domain knowledge, in which the masks are adaptively located for each type error instead of being pre-determined. Experiments on two benchmarks, including BugsInPy and TypeBugs, show that TypeFix successfully repairs 26 and 55 type errors, outperforming the best baseline approach by 9 and 14, respectively. Besides, the proposed fix template mining approach can cover 75% of developers' patches in both benchmarks, increasing the best rule-based approach PyTER by more than 30%.Comment: This paper has been accepted by ICSE'2

    Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit

    Full text link
    Adapting Deep Learning (DL) techniques to automate non-trivial coding activities, such as code documentation and defect detection, has been intensively studied recently. Learning to predict code changes is one of the popular and essential investigations. Prior studies have shown that DL techniques such as Neural Machine Translation (NMT) can benefit meaningful code changes, including bug fixing and code refactoring. However, NMT models may encounter bottleneck when modeling long sequences, thus are limited in accurately predicting code changes. In this work, we design a Transformer-based approach, considering that Transformer has proven effective in capturing long-term dependencies. Specifically, we propose a novel model named DTrans. For better incorporating the local structure of code, i.e., statement-level information in this paper, DTrans is designed with dynamically relative position encoding in the multi-head attention of Transformer. Experiments on benchmark datasets demonstrate that DTrans can more accurately generate patches than the state-of-the-art methods, increasing the performance by at least 5.45\%-46.57\% in terms of the exact match metric on different datasets. Moreover, DTrans can locate the lines to change with 1.75\%-24.21\% higher accuracy than the existing methods

    Generative Type Inference for Python

    Full text link
    Python is a popular dynamic programming language, evidenced by its ranking as the second most commonly used language on GitHub. However, its dynamic type system can lead to potential type errors, leading researchers to explore automatic type inference approaches for Python programs. The rule-based type inference approaches can ensure the accuracy of predicted variable types, but they suffer from low coverage problems. Supervised type inference approaches, while feature-agnostic, require large, high-quality annotated datasets and are limited to pre-defined types. As zero-shot approaches, the cloze-style approaches reformulate the type inference problem into a fill-in-the-blank problem. However, their performance is limited. This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis. TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs), enabling language models to learn from how static analysis infers types. By combining COT prompts with code slices and type hints, TypeGen constructs example prompts from human annotations. TypeGen only requires very few annotated examples to teach language models to generate similar COT prompts via in-context learning. Moreover, TypeGen enhances the interpretability of results through the use of the input-explanation-output strategy. Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match by using only five examples. Furthermore, TypeGen achieves substantial improvements of 27% to 84% compared to the zero-shot performance of large language models with parameter sizes ranging from 1.3B to 175B in terms of top-1 Exact Match.Comment: This paper has been accepted by ASE'2
    • …
    corecore